Skip to content

Fix broken multi-GPU path in run_solver#67

Open
sanjanag wants to merge 2 commits intolinkedin:masterfrom
sanjanag:fix-multigpu-run-solver
Open

Fix broken multi-GPU path in run_solver#67
sanjanag wants to merge 2 commits intolinkedin:masterfrom
sanjanag:fix-multigpu-run-solver

Conversation

@sanjanag
Copy link
Copy Markdown
Contributor

@sanjanag sanjanag commented Apr 27, 2026

Summary

The multi-GPU code path in run_solver was non-functional and would
fail at runtime. Multi-GPU / multi-node usage doesn't fit
run_solver's single-process API and requires a per-rank construction
pattern. Replace the broken dispatch with a clear NotImplementedError
so users hit a useful message instead of an internal TypeError, and
point them to the working pattern in the distributed tests.

Test plan

  • Existing tests pass (pytest tests/ --ignore=tests/distributed)
  • Multi-GPU path now raises NotImplementedError with guidance
  • Single-GPU path unchanged

sanjanag and others added 2 commits April 27, 2026 12:14
The compute_device_num > 1 branch passed kwargs that don't match
MatchingSolverDualObjectiveFunctionDistributed's signature, so any
multi-GPU run_solver call raised TypeError immediately. Multi-GPU /
multi-node usage requires a per-rank construction pattern that doesn't
fit run_solver's single-process inputs. Raise NotImplementedError
pointing to tests/distributed/test_matching_distributed.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the mlflow context wrapper and the dead invert_jacobi_precondition
block from run_solver. Forward objective_kwargs to the matching objective
so callers can configure it via ObjectiveArgs the same way the benchmark
script does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant